Web Content Mining

Released on: June 11, 2008, 2:56 am

Press Release Author: knowlesys

Industry: Computers

Press Release Summary: Extract data from target websites
Save web content to your Excel/Access.
http://www.knowlesys.com


Press Release Body: keyword: Web Data Mining - Exploring Hyperlinks, Contents and
Usage Data
Web mining is a rapid growing research area. It consists of Web usage mining, Web
structure mining, and Web content mining. Web usage mining refers to the discovery
of user access patterns from Web usage logs. Web structure mining tries to discover
useful knowledge from the structure of hyperlinks. Web content mining aims to
extract/mine useful information or knowledge from web page contents. This tutorial
focuses on Web Content Mining.
Web content mining is related but different from data mining and text mining. It is
related to data mining because many data mining techniques can be applied in Web
content mining. It is related to text mining because much of the web contents are
texts. However, it is also quite different from data mining because Web data are
mainly semi-structured and/or unstructured, while data mining deals primarily with
structured data. Web content mining is also different from text mining because of
the semi-structure nature of the Web, while text mining focuses on unstructured
texts. Web content mining thus requires creative applications of data mining and/or
text mining techniques and also its own unique approaches. In the past few years,
there was a rapid expansion of activities in the Web content mining area. This is
not surprising because of the phenomenal growth of the Web contents and significant
economic benefit of such mining. However, due to the heterogeneity and the lack of
structure of Web data, automated discovery of targeted or unexpected knowledge
information still present many challenging research problems. In this tutorial, we
will examine the following important Web content mining problems and discuss
existing techniques for solving these problems. Some other emerging problems will
also be surveyed.
. Data/information extraction: Our focus will be on extraction of structured data
from Web pages, such as products and search results. Extracting such data allows one
to provide services. Two main types of techniques, machine learning and automatic
extraction are covered.
. Web information integration and schema matching: Although the Web contains a huge
amount of data, each web site (or even page) represents similar information
differently. How to identify or match semantically similar data is a very important
problem with many practical applications. Some existing techniques and problems are
examined.
. Opinion extraction from online sources: There are many online opinion sources,
e.g., customer reviews of products, forums, blogs and chat rooms. Mining opinions
(especially consumer opinions) is of great importance for marketing intelligence and
product benchmarking. We will introduce a few tasks and techniques to mine such
sources.
. Knowledge synthesis: Concept hierarchies or ontology are useful in many
applications. However, generating them manually is very time consuming. A few
existing methods that explores the information redundancy of the Web will be
presented. The main application is to synthesize and organize the pieces of
information on the Web to give the user a coherent picture of the topic domain..
. Segmenting Web pages and detecting noise: In many Web applications, one only wants
the main content of the Web page without advertisements, navigation links, copyright
notices. Automatically segmenting Web page to extract the main content of the pages
is interesting problem. A number of interesting techniques have been proposed in the
past few years.
All these tasks present major research challenges and their solutions also have
immediate real-life applications. The tutorial will start with a short motivation of
the Web content mining. We then discuss the difference between web content mining
and text mining, and between Web content mining and data mining. This is followed by
presenting the above problems and current state-of-the-art techniques. Various
examples will also be given to help participants to better understand how this
technology can be deployed and to help businesses. All parts of the tutorial will
have a mix of research and industry flavor, addressing seminal research concepts and
looking at the technology from an industry angle.

For more information, please visit our website: http://www.knowlesys.com


Web Site: http://www.knowlesys.com

Contact Details: shenzhen

  • Printer Friendly Format
  • Back to previous page...
  • Back to home page...
  • Submit your press releases...
  •